-
Notifications
You must be signed in to change notification settings - Fork 3.3k
[QNN EP] Fix 16x16 MatMul translation #24846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[QNN EP] Fix 16x16 MatMul translation #24846
Conversation
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
Azure Pipelines successfully started running 5 pipeline(s). |
Analysis of failures reported in CI pipeline:
|
we only enabled HTP tests for Linux environment, but it is using QNN simulator, not real device. |
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
Azure Pipelines successfully started running 5 pipeline(s). |
@HectorSVC Thanks in advance. |
There was a fix for the Web CI pipeline, please merge the code from latest main branch. |
- QNN's 16x16 FC doesn't support asymmetric int16 weight - QNN's 16x16 MatMul doesn't support asymmetric int16 weight initializer. - Insert Convert Op to convert from asymmetric uint16 weight to symmetric int16 weight. - Add unit tests to verify 16x16 MatMul translations.
- MatMul 16x16 is supported on few hardwares - Disable MatMul 16x16 unit tests for linux platforms
aae6e64
to
698f087
Compare
@HectorSVC Thanks for the inputs. I merged the code from latest main branch. Could you please re-trigger CI pipeline? |
/azp run Linux QNN CI Pipeline,Win_TRT_Minimal_CUDA_Test_CI,Windows ARM64 QNN CI Pipeline,Windows GPU Doc Gen CI Pipeline,Windows x64 QNN CI Pipeline |
Azure Pipelines successfully started running 5 pipeline(s). |
### Description - QNN's 16x16 FC doesn't support asymmetric int16 weight - QNN's 16x16 MatMul doesn't support asymmetric int16 weight initializer. - Insert Convert Op to convert from asymmetric uint16 weight to symmetric int16 weight. - Add unit tests to verify 16x16 MatMul translations. ### Motivation and Context - This fix schedules 16x16 MatMul Ops on QNN HTP accelerator. - This improves inference time of Models contain 16x16 MatMul operators
Description
Motivation and Context